Noah-FetchRewards opened a new issue, #1180: URL: https://github.com/apache/datafusion-ballista/issues/1180
**Describe the bug** I'm deploying the ballista cluster on kuberentes on AWS EKS using the documentation/ yaml files at: https://datafusion.apache.org/ballista/user-guide/deployment/kubernetes.html I'm trying to run the "remote-sql.rs" example to ensure it works, and I can't seem to get it working? I uploaded the aggregate_test_100.csv file to the /mnt directory on the ballista scheduler, but I repeatedly get the error: `Error: ObjectStore(NotFound { path: "/mnt/aggregate_test_100.csv", source: Os { code: 2, kind: NotFound, message: "No such file or directory" } })` I can confirm the scheduler has the file loaded onto it because I can "sh" into the cluster and view the file with "ls". Here is an example of the code: ` #[tokio::main] async fn main() -> Result<()> { let config = SessionConfig::new_with_ballista() .with_target_partitions(4) .with_ballista_job_name("Remote SQL Example"); let state = SessionStateBuilder::new() .with_config(config) .with_default_features() .build(); let ctx = SessionContext::remote_with_state("df://external_ip:50050", state).await?; ctx.register_csv("test", "/mnt/aggregate_test_100.csv", CsvReadOptions::new()).await?; let df = ctx .sql( "SELECT c1, MIN(c12), MAX(c12) FROM test WHERE c11 > 0.1 AND c11 < 0.9 GROUP BY c1", ) .await?; // 7) Print the query results df.show().await?; Ok(()) }` I've also tried using the original example, where it references the file locally, which obviously didn't work. ` ctx.register_csv( "test", &format!("{test_data}/aggregate_test_100.csv"), CsvReadOptions::new(), ) .await?; ` What I really want to do, is have it reference a file in a s3 bucket, so I initially tried: ``` #[tokio::main] async fn main() -> Result<()> { let s3_store = object_store::aws::AmazonS3Builder::new() .with_bucket_name("ballista-noah-2") .with_access_key_id("my key id") .with_secret_access_key("my key") .with_token("my token") .with_region("us-east-1") .build()?; let runtime_env = RuntimeEnvBuilder::new() .build()?; let s3_url = Url::parse("s3://ballista-noah-2") .map_err(|e| datafusion::error::DataFusionError::External(Box::new(e)))?; runtime_env.register_object_store(&s3_url, Arc::new(s3_store)); let session_config = SessionConfig::new_with_ballista() .with_target_partitions(4) .with_ballista_job_name("Remote SQL Example"); let state = SessionStateBuilder::new() .with_config(session_config) .with_runtime_env(Arc::new(runtime_env)) .with_default_features() .build(); let ctx = SessionContext::remote_with_state("df://127.0.0.1:50050", state).await?; ctx.register_csv( "test", "s3://ballista-noah-2/aggregate_test_100.csv", CsvReadOptions::new(), ) .await?; let df = ctx .sql( "SELECT c1, MIN(c12), MAX(c12) FROM test WHERE c11 > 0.1 AND c11 < 0.9 GROUP BY c1", ) .await?; df.show().await?; Ok(()) } ``` This results in the error: Error: ArrowError(ExternalError(Execution("Job LyVMWvI failed: Error planning job LyVMWvI: DataFusionError(Internal(\"No suitable object store found for s3://ballista-noah-2/aggregate_test_100.csv. See `RuntimeEnv::register_object_store`\"))")), None) I've been trying many different variations here. I'm definitely doing something wrong, and I'm hoping someone can point me in the right direction -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org